Audiovisual-to-articulatory inversion
نویسندگان
چکیده
It has been shown that acoustic-to-articulatory inversion, i.e. estimation of the articulatory configuration from the corresponding acoustic signal, can be greatly improved by adding visual features extracted from the speaker’s face. In order to make the inversion method usable in a realistic application, these features should be possible to obtain from a monocular frontal face video, where the speaker is not required to wear any special markers. In this study, we investigate the importance of visual cues for inversion. Experiments with motion capture data of the face show that important articulatory information can be extracted using only a few face measures that mimic the information that could be gained from a video-based method. We also show that the depth cue for these measures is not critical, which means that the relevant information can be extracted from a frontal video. A real video-based face feature extraction method is further presented, leading to similar improvements in inversion quality. Rather than tracking points on the face, it represents the appearance of the mouth area using independent component images. These findings are important for applications that need a simple audiovisual-to-articulatory inversion technique, e.g. articulatory phonetics training for second language learners or hearing-impaired persons. 2008 Elsevier B.V. All rights reserved.
منابع مشابه
Synthesis of fricative consonants by audiovisual-to-articulatory inversion
We present here results of audio-visual to articulatory inversion for French fricatives embedded into VCVs. The inversion technique is evaluated using both experimental and synthetic data. The final synthesis is assessed by a perceptual categorisation test. Synthetic stimuli have similar scores as natural ones.
متن کاملEvaluation of speech inversion using an articulatory classifier
This paper presents an evaluation method for statistically based speech inversion, in which the estimated vocal tract shapes are classified into phoneme categories based on the articulatory correspondence with prototype vocal tract shapes. The prototypes are created using the original articulatory data and the classifier hence permits to interpret the results of the inversion in terms of, e.g.,...
متن کاملLearning to speak. Sensori-motor control of speech movements
This paper shows how an articulatory model, able to produce acoustic signals from articulatory motion, can learn to speak, i.e. coordinate its movements in such a way that it utters meaningful sequences of sounds belonging to a given language. This complex learning procedure is accomplished in four major steps: (a) a babbling phase, where the device builds up a model of the forward transforms, ...
متن کاملIntroducing visual cues in acoustic-to-articulatory inversion
The contribution of facial measures in a statistical acoustic-toarticulatory inversion has been investigated. The tongue contour was estimated using a linear estimation from either acoustics or acoustics and facial measures. Measures of the lateral movement of lip corners and the vertical movement of the upper and lower lip and the jaw gave a substantial improvement over the audio-only case. It...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Speech Communication
دوره 51 شماره
صفحات -
تاریخ انتشار 2009